GE Aviation - Remaining Useful Life Analysis

Part 2 - Data Overview

Author

Linh Tran

Read the Data

import pandas as pd
from pandas_profiling import ProfileReport
df = pd.read_csv("D:\School\FL 2022\ISA 401\GE\ge_data.csv")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 36 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   dataset               100 non-null    object 
 1   esn                   100 non-null    int64  
 2   unit                  100 non-null    int64  
 3   operator              100 non-null    object 
 4   last_flight_cycle     100 non-null    int64  
 5   last_datetime         100 non-null    object 
 6   mean_tra              100 non-null    int64  
 7   mean_t2               100 non-null    float64
 8   mean_t24              100 non-null    float64
 9   mean_t30              100 non-null    float64
 10  mean_t50              100 non-null    float64
 11  mean_p2               100 non-null    float64
 12  mean_p15              100 non-null    float64
 13  mean_p30              100 non-null    float64
 14  mean_nf               100 non-null    float64
 15  mean_nc               100 non-null    float64
 16  mean_epr              100 non-null    float64
 17  mean_ps30             100 non-null    float64
 18  mean_phi              100 non-null    float64
 19  mean_nrf              100 non-null    float64
 20  mean_nrc              100 non-null    float64
 21  mean_bpr              100 non-null    float64
 22  mean_farb             100 non-null    float64
 23  mean_htbleed          100 non-null    float64
 24  mean_nf_dmd           100 non-null    int64  
 25  mean_pcnfr_dmd        100 non-null    int64  
 26  mean_w31              100 non-null    float64
 27  mean_w32              100 non-null    float64
 28  mean_X44321P02_op016  100 non-null    float64
 29  mean_X44321P02_op420  100 non-null    float64
 30  mean_X54321P01_op116  100 non-null    float64
 31  mean_X54321P01_op220  100 non-null    float64
 32  mean_X65421P11_op232  100 non-null    float64
 33  mean_X65421P11_op630  100 non-null    float64
 34  total_distance        100 non-null    float64
 35  rul                   100 non-null    int64  
dtypes: float64(26), int64(7), object(3)
memory usage: 28.2+ KB

Profile Report

profile = ProfileReport(df)
profile.to_notebook_iframe()

Summary

  • There are no missing values, zero values, or duplicates in the data

  • Variables with constant values hence will be dropped:

    • mean_tra

    • mean_t2

    • mean_p2

    • mean_epr

    • mean_farb

    • mean_nf_dmd

    • mean_pcnfr_dmd

  • Variables that contribute little to no information to the model hence will also be dropped:

    • dataset

    • esn

    • unit

    • last_datetime

  • mean_p15 has values that are very close to each other that might make it highly correlated with many other variables. I believe this variable could be dropped as it is not so meaningful.

  • mean_t24 is highly correlated with operator and its distribution is quite skewed, so I considered dropping this variable as well.